Fault Management in Map-Reduce Through Early Detection of Anomalous Nodes

نویسندگان

  • Selvi Kadirvel
  • Jeffrey Ho
  • José A. B. Fortes
چکیده

Map-Reduce frameworks such as Hadoop have built-in fault-tolerance mechanisms that allow jobs to run to completion even in the presence of certain faults. However, these jobs can experience severe performance penalties under faulty conditions. In this paper, we present Fault-Managed Map-Reduce (FMR) which augments Hadoop with the functionality to mitigate job execution time penalties. FMR uses an anomaly detection algorithm based on sparse coding to anticipate a faulty slave node. This proposed technique has the following key advantages: (1) model training uses only normal-class data, (2) time taken for prediction is less than a second, and (3) confidence estimates are produced along with the anomaly prediction. FMR uses the result of anomaly detection to invoke a closed-loop recovery action, namely dynamic resource scaling. A scaling heuristic is proposed to determine the extent of scaling necessary to reduce impending performance penalty. FMR facilitates practical adoption by being implemented as a set of libraries and scripts that require no changes to the underlying source code of Hadoop. A set of realistic Map-Reduce applications were studied through a few thousand job executions on a 72-node Hadoop testbed. Detailed empirical evaluation shows that FMR successfully mitigates performance penalties from 119% down to 14%, averaged across experiments.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

FDMG: Fault detection method by using genetic algorithm in clustered wireless sensor networks

Wireless sensor networks (WSNs) consist of a large number of sensor nodes which are capable of sensing different environmental phenomena and sending the collected data to the base station or Sink. Since sensor nodes are made of cheap components and are deployed in remote and uncontrolled environments, they are prone to failure; thus, maintaining a network with its proper functions even when und...

متن کامل

Detection of Mycobacterium avium subsp. paratuberculosis in the mesenteric lymph nodes of goats by PCR and culture

The efficacy of bacterial cultures and IS900-specific polymerase chain reaction (PCR) was compared for the detection of Mycobacterium avium subsp. paratuberculosis (MAP) from the mesenteric lymph nodes of goats. Samples were collected from 75 goats slaughtered in Ilam, in southwest of Iran. Tissue homogenates were inoculated onto four media. The genomic DNA was extracted directly from mesenteri...

متن کامل

Developing A Fault Diagnosis Approach Based On Artificial Neural Network And Self Organization Map For Occurred ADSL Faults

Telecommunication companies have received a great deal of research attention, which have many advantages such as low cost, higher qualification, simple installation and maintenance, and high reliability. However, the using of technical maintenance approaches in Telecommunication companies could improve system reliability and users' satisfaction from Asymmetric digital subscriber line (ADSL) ser...

متن کامل

Geothermal area detection using Landsat 8 operational land imager and thermal infrared sensor data in Ardabil province, Iran

GIS and remote sensing technique with using Landsat 8 images data are very important methods for detection of geothermal resources. In this study, Land Surface Temperature (LST) for Ardabil province in northwest of Iran, was derived with the use of Landsat 8 Operational Land Imager (OLI) of 30 m spatial resolution and Thermal Infrared Sensor (TIRS) data of 100 m spatial resolution. We consider ...

متن کامل

Accurate Fruits Fault Detection in Agricultural Goods using an Efficient Algorithm

The main purpose of this paper was to introduce an efficient algorithm for fault identification in fruits images. First, input image was de-noised using the combination of Block Matching and 3D filtering (BM3D) and Principle Component Analysis (PCA) model. Afterward, in order to reduce the size of images and increase the execution speed, refined Discrete Cosine Transform (DCT) algorithm was uti...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013